The Design and Use of a Latin Dependency Treebank

نویسنده

  • David Bamman
چکیده

While much of the research and labor in treebanks has focused on modern languages, recent scholarship has also seen the rise of treebanks for historical languages as well, such as Middle English (Kroch and Taylor [15]), Early Modern English (Kroch et al. [16]), Old English (Taylor et al. [28]), Early New High German (Demske et al. [11]) and Medieval Portuguese (Rocio et al. [27]). Like their modern counterparts, these historical treebanks serve two distinct ends and often two different audiences: they provide crucial datasets for NLP projects such as automatic parsing and grammar induction while also providing a valuable corpus for scholars researching the state of a language and its progression across time. Historical treebanks, however, also offer one additional benefit over modern treebanks: they provide an annotated set of texts that scholars actually care about. When linguists of modern languages base theories on corpus evidence, their analysis is generally directed toward the language at large; few, if any, pore over the Wall Street Journal examining its use of an arcane literary device. If the corpus is Vergil, however, we do. The sheer volume of Latin texts available electronically1 not to mention the enormous mass still locked in print is much larger than the small community of scholars and students who can read it. This alone justifies a treebank as a resource for those attempting to learn the language, but it also highlights the need for automatic methods of parsing and machine translation. To this end a Latin treebank will well serve the NLP community, which has a long history of applying such research to modern languages.2 Classical scholars, however, largely operate on a fixed canon of texts. The value of a treebank for them is not so much in training

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

تبدیل خودکار درخت‌بانک وابستگی فارسی به درخت‌بانک سازه‌ای

There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a depe...

متن کامل

تولید درخت بانک سازه‌ای زبان فارسی به روش تبدیل خودکار

Treebanks is one of important and useful resource in Natural Language Processing tasks. Dependency and phrase structures are two famous kinds of treebanks. There have already made many efforts to convert dependency structure to phrase structure. In this paper we study an approach to convert dependency structure to phrase structure because of lack of a big phrase structure Treebank in Persian. A...

متن کامل

Porting an Ancient Greek and Latin Treebank

We have recently converted a dependency treebank, consisting of ancient Greek and Latin texts, from one annotation scheme to another that was independently designed. This paper makes two observations about this conversion process. First, we show that, despite significant surface differences between the two treebanks, a number of straightforward transformation rules yield a substantial level of ...

متن کامل

The Annotation Guidelines of the Latin Dependency Treebank and Index Thomisticus Treebank: the Treatment of some specific Syntactic Constructions in Latin

The paper describes the treatment of some specific syntactic constructions in two treebanks of Latin according to a common set of annotation guidelines. Both projects work within the theoretical framework of Dependency Grammar, which has been demonstrated to be an especially appropriate framework for the representation of languages with a moderately free word order, where the linear order of co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006